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q Abstract 

■ Consider a countably infinite set of nodes, which sequentially make decisions between two given 

hypotheses. Each node takes a measurement of the underlying truth, observes the decisions from some 
immediate predecessors, and makes a decision between the given hypotheses. We consider two classes of 
broadcast failures: 1) each node broadcasts a decision to the other nodes, subject to random erasure in the 
form of a binary erasure channel; 2) each node broadcasts a randomly flipped decision to the other nodes 
in the form of a binary symmetric channel. We are interested in whether there exists a decision strategy 
' consisting of a sequence of likelihood ratio tests such that the node decisions converge in probability to 

OO : 

the underlying truth. In both cases, we show that if each node only learns from a bounded number of 

in 



immediate predecessors, then there does not exist a decision strategy such that the decisions converge 
in probability to the underlying truth. However, in case 1, we show that if each node learns from an 
| unboundedly growing number of predecessors, then the decisions converge in probability to the underlying 

truth, even when the erasure probabilities converge to 1 . We also derive the convergence rate of the error 
probability. In case 2, we show that if each node learns from all of its previous predecessors, then the 
decisions converge in probability to the underlying truth when the flipping probabilities of the binary 



symmetric channels are bounded away from 1/2. In the case where the flipping probabilities converge to 
1/2, we derive a necessary condition on the convergence rate of the flipping probabilities such that the 
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decisions still converge to the underlying truth. We also explicitly characterize the relationship between 
the convergence rate of the error probability and the convergence rate of the flipping probabilities. 

Index Terms 

Asymptotic learning, decentralized detection, erasure channel, herding, social learning, symmetric 
channel. 

I. Introduction 

We consider a countably infinite set of nodes {0,1,0,2, ■ ■ ■}, which sequentially make decisions between 
two hypotheses Hq and Hi. At stage k, node aj, takes a measurement (called its private signal), 
receives the decisions of < k immediate predecessors, and makes a binary decision dk = or 1 about 
the prevailing hypothesis Hq or Hi, respectively. It then broadcasts a decision to its successors. Note that 
?7ifc is often referred to as the memory size. A typical question is this: Can these nodes asymptotically 
learn the underlying true hypothesis? In other words, does the decision dk converge (in probability) to 
the true hypothesis as k — > 00? If so, what is the convergence rate of the error probability? 

One application of the sequential hypothesis testing problem is decentralized detection in sensor 
networks, in which case the set of nodes represents a set of spatially distributed sensors attempting 
to jointly solve the hypothesis testing problem. Due to limited resources for processing and transmitting 
data, each sensor aggregates its measurement and the observed decisions from the previous sensors into 
a much smaller message (e.g., a 1-bit decision) and then sends it to other sensors for further aggregation. 
A central question is whether we can design a sequence of decision rules to aggregate the spatially 
distributed information such that the decisions converge to the underlying truth as the number of sensors 
increases. 

Another application is social learning in multi-agent networks, in which case the set of nodes represents 
a set of agents trying to learn the underlying truth (also known as the state of the world). Each agent 
makes a decision based on its own measurement and what it learns from the actions/decisions of the 
previous agents. In this case, we usually assume that each agent uses a myopic decision rule to minimize 
a local objective function; for example, the probability of error is locally minimized using the Bayesian 
likelihood ratio test with a threshold given by the ratio of the prior probabilities. The question in this 
setting is whether the agents in the social network can asymptotically learn the state of the world. 
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A. Related Work 

The literature on hypothesis testing in decentralized networks is vast, spanning various disciplines 
including signal processing, game theory, information theory, economics, biology, physics, computer 
science, and statistics. Here we only review the relevant asymptotic learning results in the network 
structure relevant to this paper. 

The research on our problem begins with a seminal paper by Cover [1], which considers the case 
where each node only observes the decision from its immediate previous node, i.e., = 1 for all k. 
This structure is also known as a serial network or tandem network and has been studied extensively 
in ifTTl — lfT4Tl - We use Pj and ttj to denote the probability measure and the prior probability associated with 
Hj, j = 0, 1, respectively. Cover 0]] shows that if the (log)-likelihood ratio for each private signal Xj~ is 
bounded almost surely, i.e., there exists a positive constant C such that 



< C 



almost surely, then using a sequence of likelihood ratio tests the (Bayesian) error probability P^ = 
7roPo(c4 = 1) + 7TiPi((4 = 0) does not converge in probability to as k — > oo. Conversely, if the 
likelihood ratio is unbounded, then the error probability converges to 0. In the case of unbounded 
likelihood ratios for the private signals, Veeravalli [8] shows that the error probability converges sub- 
exponentially with respect to the number k of nodes in the case where the private signals follow i.i.d 
Gaussian distribution. Tay et al. [ 10 ] show that the convergence of error probability is always sub- 
exponential and derive a lower bound for the convergence rate of the error probability in the tandem 
network. Lobel et al. ifTTll derive a lower bound for the convergence rate in the case where each node learns 
randomly from one previous node (not necessarily its immediate predecessor). In the case of bounded 
likelihood ratios, Drakopoulos et al. lTT2l provide a non-Bayesian decision strategy, which results in 
convergence of the error probability. 

Another extreme scenario is that each node can observe all the previous decisions; i.e., = k — 1 for 
all k. This scenario was first studied in the context of social learning [15], [16], where each node uses the 
Bayesian likelihood ratio test to make its decision. In the case of bounded likelihood ratios for the private 
signals, the authors of ifTBl and lfl6l show that the error probability does not converge to 0, which results 
in arriving at the wrong decision with positive probability. In flTI , we show that in balanced binary trees, 
the decisions converge to the right decision even if the likelihood ratios of signals converge to 1 as the 
number of nodes increases. We further studied in |[T8l the convergence rate of the error probability in 
more general tree structures. In the case of unbounded likelihood ratios for the private signals, Smith 
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and Sorensen [19 ] study this problem using martingales and show that the error probability converges to 

0. Krishnamurthy EUI . 11211 studies this problem from the perspective of quickest time change detection. 
Chamley [22] provides a convergence rate analysis of the error probability in these structures. Acemoglu 
et al. |[23l show that the nodes can asymptotically learn the underlying truth in more general network 
structures. 

Most previous work including those reviewed above assume that the nodes and links are perfect. We 
study the sequential hypothesis testing problem when broadcasts are subject to random erasure or random 
flipping. 

B. Contributions 

In this paper, we assume that each node uses a likelihood ratio test to generate its binary decision. 
We call the sequence of likelihood ratio tests a decision strategy. We want to know whether or not there 
exists a decision strategy such that the node decisions converge in probability to the underlying true 
hypothesis. We consider two classes of broadcast failures: 

1) Random erasure: Each broadcasted decision is erased with a certain erasure probability, modeled by 
a binary erasure channel. If the decision broadcasted by a node is erased, then none of its successors 
will observe that decision. 

2) Random flipping: Each broadcasted decision is flipped with a certain flipping probability, modeled by 
a binary symmetric channel. If the broadcasted decision of a node is flipped, then all the successors 
of that node observe that flipped decision. 

For case 1, we show that if each node can only learn from a bounded number of immediate predecessors, 

1. e., there exists a constant C such that rrik < C for all k, then for any decision strategy, the error 
probability cannot converge to 0. We also show that if — > oo as k — > oo, then there exists a decision 
strategy such that the error probability converges to 0, even if the erasure probability converges to 1 
(given that the convergence of the erasure probability is slower than a certain rate). In the case where an 
agent learns from all its predecessors, the convergence rate of the error probability is 0(1/ yk). More 
specifically, we show that if the memory size = Q(k a ), a < 1, then the error probability decreases 
as 9(l/fc min ( a ' 1 / 2 )). 

For case 2, we show that if each node can only learn from a bounded number of immediate predecessors, 
then for any decision strategy, the error probability cannot converge to 0. We also show that if each node 
can learn from all the previous nodes, i.e., = k — 1, then the error probability converges to using 
the myopic decision strategy when the flipping probabilities are bounded away from 1/2. In this case, 
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we show that the error probability converges to as f2(/c~ 2 ). In the case where the flipping probability 
converges to 1/2, we derive a necessary condition on the convergence rate of the flipping probability 
(i.e., how fast it must converge) such that the error probability converges to 0. More specifically, we show 
that if there exists p > 1 such that the flipping probability converges to 1/2 as 0(l/fc(log k) p ), then it is 
impossible that the error probability converges to 0. Therefore, only if the flipping probability converges 
as f2(l/fc(log k) p )) for some p < 1 can we hope for P e fc — > 0. Under this condition, we characterize 
explicitly the relationship between the convergence rate of the flipping probability and the convergence 
rate of the error probability. 



We use P to denote the underlying probability measure. We use ttj to denote the prior probability 
(assumed nonzero), Fj to denote the probability measure, and Ej to denote the conditional expectation 
associated with Hj, j = 0, 1. At stage k, takes a measurement Xk of the scene and makes a decision 
dk = or dk = 1 about the prevailing hypothesis Hq or H±. It then broadcasts a potentially corrupted 
form dk of that decision to its successors. Note that in case 1, if the decision is erased, it is equivalent to 
saying that the corrupted decision dk is e, which is a message that cannot be decoded by the receivers. The 
decision dk of node ak is made based on the private signal Xk and the sequence of corrupted decisions 
D mk = {di,d,2, . . . ,dm k } received from the mk immediate predecessor nodes using a likelihood ratio 



Our aim is to find a sequence of likelihood ratio tests such that the probability of making a wrong 
decision about the state of the world tends to as k — > oo; i.e., 



Before proceeding, we introduce the following definitions and assumptions: 
1) The two probability measures Po and Pi are absolutely continuous with respect to each other. To 
avoid trivialities, we exclude the case where these two measures are equal almost everywhere; i.e., 
assume that D(P ||Pi) > > -B(Pi||P ), where D denotes the Kullback-Leibler divergence. 



II. Preliminaries 



test. 



lim P^ 



lim (7r P (4 = 1) + vriPi(4 = 0)) -> 0. 



fc— >oo 



2) 



The private signal Xk is independent of the broadcast history D 



and the X^s are mutually 



3) 



independent and identically distributed under each of Hq and H±. 
Let the likelihood ratio of the private signal Xk be 
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We assume that the likelihood ratios for the private signals are unbounded; i.e., for all C > 0, 

P(|logL(X fc )| >C) >0. 

4) Suppose that is the underlying truth. Let b k = P(0 = Hi\X k ), which we call the private belief of 
a k . By Bayes' rule, we have 

mdF^X^/dXk 



bk 



7ridPi(X fc )/dX fc + 7r dP (X fc )/dX fc 

V 7TldPl V ^ 

l + ^Xfc)- 1 ) (1) 

7Tl 



5) For node a k , the likelihood ratio of the observed decisions is 



P (AnJ 

6) Let 6^ = P(6> = iJi|£) mfe ), which we call the public belief of a k . We have 

6 fc = (l + ^L(£ m J _1 ) • (2) 

7) Each node makes its decision based on a likelihood ratio test with a threshold t k > 0: 



4 



1 if L{X k )L{D mk ) > t k , 

if L{X k )L{D mk ) < t k . 

If tfc = tto/^i^ then this test becomes the maximum a-posteriori probability (MAP) test, in which 
case the probability of error is locally minimized for node a k . If t k = 1, then the test becomes 
the maximum-likelihood (ML) test. If the prior probabilities are equal, then these two tests are 
identical. A decision strategy T is a sequence of likelihood ratio tests with thresholds {tk} k ^ =v 
Given a decision strategy, the decision sequence {d k } k x L 1 is a stochastic process, described by the 
probability measure ¥j. Note that we use P to denotes the measure if the strategy is fixed or the 
event does not depend on the decision strategy. 
8) We say that the system asymptotically learns the underlying true hypothesis with decision strategy 
T if 

lim P T (d fe = 0) = 1. 

k— >oo 

In other words, the probability of making a wrong decision goes to 0, i.e., lim^ooPg = 0. The 
question we are interested in is this: In each of the two classes of failures, is there a decision strategy 
such that the system asymptotically learns the underlying true hypothesis? 
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III. Random Erasure 

In this section, we consider the sequential hypothesis testing problem in the presence of random 
erasures, modeled by binary erasure channels. Suppose that the binary message dk is the input to a 
binary erasure channel and dk is the output, which is either equal to dk (no erasure) or is equal to a 
symbol e that represents the occurrence of an erasure. The erasure channel matrix at stage k is given 
by F(dk = i\dk = j), j = 0, 1 and i = j,e. Recall that each node cik observes nik immediate previous 
broadcasted decisions. We divide our analysis into two scenarios: A) {nik} is bounded above by a positive 
constant; B) nik goes to infinity as k — > oo. 

A. Bounded Memory 

Theorem 1: Suppose that there exists C such that < C for all k; and there exists e > such that 
for all k and for j = 0, 1, F(d k = e\dk = j) € [e, 1 — e]. Then, there does not exist a decision strategy 
such that the error probability converges to 0. 

Proof: We first prove this claim for the special case of the tandem network, where nik = 1 for all k. 
For each node a k , with a nonzero probability F(dk = e\dk = j), the decision dk-i = j of the immediate 
predecessor is erased and node makes a decision based only on its own private signal X k . In this 
case, we claim that the error probability as a sequence of k, 

F k e = 7r P (dfc = 1) + vriPi(4 = 0) 

= ir F (L(X k ) > t k ) + 7riPi(L(X fe ) < t k ), 

is bounded away from 0. We prove the above claim by contradiction. Suppose that there exists a decision 
strategy with threshold sequence {tk} such that Pg — > as k — > oo. Then, we must have F\(L(Xk) < 
tk) — > because tt\ is positive. Because Po and Pi are equivalent measures, this implies that Fo(L(Xk) < 
tk) — > 0. Hence we have Fo(L(Xk) > tk) — > 1. Therefore, Pg does not converge to 0. 

We can now generalize this proof to the case of a general bounded nik sequence. Let £k be the event 
that a>k receives nik non-decodable symbols. Then, the probability F(£k) is bounded below according to 

F(£ k ) > max F(d rn = e\d m = j)\ > e c . 

\m=k—l,...,k—mk } 

We have already shown that given this event the error probability does not converge to 0. Then by the 
Law of Total Probability, we have P^ > F(£ k )F(dk ± 0\S k ), which means that the error probability does 
not converge to 0. 
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This result is straightforward to understand. If the memory sizes are bounded for all nodes, then for 
each node, there exists a positive probability such that all the decisions received from its immediate 
predecessors are not decodable, in which case the node has to make a decision based on its own 
measurement. The error probability cannot converge to because of the equivalent-measure assumption. 

B. Unbounded Memory 

Suppose that each node observes immediate previous decisions. In this section, we deal with 
the case where is unbounded^ More specifically, we consider the case where goes to infinity. We 
first consider the case where the erasure probabilities are bounded away from 1. We have the following 
result. 

Theorem 2: Suppose that goes to infinity as k — > oo and there exists e > such that for all 
j = 0, 1 and for all k, P((4 = e\dk = j) < 1 — e. Then, there exists a decision strategy such that the 
error probability converges to 0. 

Proof: We prove this result by constructing a certain tandem network within the original network 
using a backward-searching scheme. The scheme is the following: Consider node in the original 
network. Let n& be the largest integer such that each node in the sequence {a^-n 2 > a fe-n 2 -i; • • • i a k} 
of n? + 1 nodes has a memory size that is greater than or equal to n&. Because goes to infinity as 
k — > oo, we have n/- — > oo as k — > oo. Consider the event that a& receives at least one decision j that 
is decodable from {aj t _ nk , . . . , a;-_i}, its immediate predecessors. The probability of this event is at 
least 

1 - max P(d m = e\d m = j) 
j=o,i 

m=k—n k ,...,k—l 

which is bounded below by 1 — (1 — e) nk by the assumption on the erasure probabilities. We denote the 
node that sends the unerased decision by ■ Similarly, with a certain probability, receives at least 
one decodable decision from its immediate predecessors. Recursively, with a certain probability, we 
can construct a tandem network with length using nodes from among the n|+ 1 nodes above within the 
original network. Let Ek be the event that such a tandem network exists. The probability P(£&) is at least 
(1 — (1 — e) nk ) nk . Recall that lirm^oo = oo, which implies that lim^^l — (1 — e) nk ) nk = 1. Hence 

'The assumption that mu is unbounded is not sufficiently strong to guarantee the convergence of error probability to 0. An 
example is that the memory size irik equals vfc if y/k is an integer and it equals 1 otherwise. In this case, we can use a similar 
argument as that in the proof of Theorem 1 to show that the error probability does not converge to 0. 
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we have lim^oo P(£fc) = 1- Conditioned on by using the strategy T consisting of a sequence of 
likelihood ratio tests with monotone thresholds described in HI, we can get the conditional convergence 
of the error probability, given £ k , to 0. We can also use the equilibrium strategy described in IfTT) . 
Therefore, by the law of total probability, we have 

lim P T (4 + 0) = I™ (P T (4 ± e\£ k )F(£ k ) + P T (4 ± " P(Sfc)) 

k— >oo k— >oo 

< lim (P T (4 + 0\£ k ) + (1 - P(£ fc )) = 0. (3) 

fc— >oo 



Note that the convergence rate for the error probability in this case depends on how fast F(£ k ) converges 
to 1 and how fast Wj(d k / 6\£k) converges to 0. 

First let us consider the convergence rate of P(£ k )- Obviously this convergence rate depends on the 
convergence rate of n k . Moreover, the convergence rate of n k depends on the convergence rate of m k . 
For example, if m k goes to infinity extremely slowly, then n k grows extremely slowly with respect to 
k, which means that P(£&) converges to 1 extremely slowly with respect to k. Next we assume that 
mfc increases as @(k a ). We first establish a relationship between the convergence rate of rrik and the 
convergence rate of when using the backward-searching scheme. 

Proposition 1: Suppose that = 0(A; a ) where a < 1. Then, we have 

Q(Vk) if a > 1/2, 
Q(k a ) if a < 1/2. 

Proof: Suppose that we can form a tandem network with length within the original network. 
Recall that is the largest integer such that each node in the sequence {a fc _ n 2, a fc _„2_ 1 , . . . ,0^} of 
n| + 1 nodes has a memory size that is greater than or equal to n^. Therefore, the memory size mfc_ n 2 
of a^_ n 2 must be larger than or equal to by assumption. Hence we have 

mk-nl ={k- nl) a > n k . 



nk 



Moreover, the memory size m k _^ nk+1 y2 of a k _^ nk+1 ^ must be strictly smaller than n k + 1 (otherwise 
we can construct a tandem network with length n k + 1). Hence we have 

m fe _( n fc + i) 2 = (k- (n k + l) 2 ) a < n k + 1. 

From the above two inequalities, we easily obtain the desired asymptotic rates for n k . 
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We have derived the convergence rate for n^. Recall that P(£k) converges to 1 at least in the rate of 
0(rifc(l — e) nk ). From this fact and Proposition 1, we derive the convergence rate for P(£fc). 
Corollary 1: Suppose that = Q(k a ) where a < 1. Then, we have 

-e)^*) if a > 1/2, 
ef a ) if a < 1/2. 




Second, let us consider the convergence rate of ¥j(dk 7^ 0\£k)- Recall that £k denotes the event that a 
tandem network with length exists. Conditioned on if we use the the equilibrium strategy described 
in ifTTTl . then it has been shown that the error probability converges to as ©(l/n^), with appropriate 
assumptions on the distributions of the private signal. From this fact and Proposition 1, we derive the 
convergence rate for Pj(dk 7^ 0\£k)- 

Corollary 2: Suppose that = @(k a ) where a < 1. Then, we have 



e(l/Vk) if a > 1/2, 
®{l/k a ) if a < 1/2. 



Notice that the convergence rate of Pj(dk 7^ 0\£k) is much smaller than that of P(£k)- Moreover by (O, 
the convergence rate of Pj(dk ^ 9) depends on the smaller of the convergence rates of Pj(dk 7^ 0\£^) 
and P(£fc). We derive the convergence rate for the error probability as follows. 

Corollary 3: Suppose that = @(k a ) where a < 1. Then, we have 

Q(l/Vk) if a > 1/2, 
Q(l/k a ) if a < 1/2. 



Ft (4 + 



We have considered the situation where the erasure probabilities are bounded away from 1. Now 
consider the case where the erasure probability P(e4 = e|c4 = j) converges to 1. 

Theorem 3: Suppose that P(c4 = e\dk = j) — > 1 and there exists e > 1 and c > such that 
P(dk = e\dk = j) < (crik)~ e / nk ■ Then, there exists a decision strategy such that the error probability 
converges to 0. 

Proof: We use the scheme described in the proof of Theorem 2. The probability that a tandem 
network with length exists is at least (1 — ((cri/ c ) _e / nfc ) nfc ) nfc = (1 — (cni,)~ t ) nk , which converges to 
1 as k — > 00. Using the same arguments as those in the proof of Theorem 2, we can show that the error 
probability converges to 0. 
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As an example, we consider the situation where each node observes all the previous decisions; i.e, 
n^k = k — 1 for all k. In this case, it is easy to show that using the backward-searching scheme, with 



a certain probability, we can form a tandem network with length n& = [\/k — lj. Suppose that the 
erasure probabilities are bounded away from 1. Then, the error probability converges to as 0(l/\/&)- 
Moreover, the error probability converges to even if the erasure probability converges to 1, provided 
that P(4 = e\d k = j) < (cn fc )- e/nfe . 

IV. Random Flipping 

We study in this section the sequential hypothesis testing problem with random flipping, modeled by a 
binary symmetric channel. Suppose that dk is the input to a binary symmetric channel and dk is the output, 
which is either equal to dk (no flipping) or is equal to its complement 1— dk (flipping). The channel matrix 
is given by P(c4 = i\dt = j), i,j = 0, 1. We assume that P(4 = l\dk = 0) = P(c4 = l\dk = 0) = qk, 
where qk denotes the probability of a flip. The assumption of symmetry is for simplicity only, and all 
results obtained in this section can be generalized easily to a general binary communication channel with 
unequal flipping probabilities, i.e., P(c4 = l\dk = 0) / P(c4 = 0\dk = 1). 

A. Bounded Memory 

Theorem 4: Suppose that there exists C such that rrik < C for all k; and there exists e > such 
that for all k we have q^ £ [e, 1 — e]. Then, there does not exist a decision strategy such that the error 
probability converges to 0. 

Proof: We first prove this theorem in the case where each node observes the immediate previous 
node; i.e., m& = 1 for all k. Node makes a decision dk based on its private signal Xk and the decision 
dk-i from its immediate predecessor. Recall that q^ = P(c4 = l\dk = 0) = P(dfc = ®\dk = !)• The 
likelihood ratio test at stage k (with a threshold tk > 0) is 

1 if L(X fe )L(4_ 1 ) > t k , 
if L(X k )L(dk-i) < t k , 



dk = < 



where 

Pi(4-i; 



L(d k -i) = 



Po(dfc- 
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and Pj-(djfc_i), j = 0,1 is given by 

Pi(4-l) = fffc(l " + (1 " q k )PA d k-l) 

= q k + (l-2q k )F j (d k - 1 ). (4) 

Let tk(dk-i) = t k / L(d k -\) be the testing threshold for L(X k ) when 4-1 is received. Then, the 
likelihood ratio test can be rewritten as 



4 



dk 



1 if L(X fc ) > t(4-i), 
if L(X fe ) < *(4-i). 

From (01), we notice that Pj(d k -i) depends linearly on Fj(d k -i). Without loss of generality, henceforth we 
assume that q k < 1/2^ It is obvious that t k (0) > t k (l) because L(4-i) = Pi (4-1 = j)/Po(4-l = j) 
is non-decreasing in j. Therefore, the likelihood ratio test becomes 

1 if L(X k ) > i fe (0), 

if L(X k ) < i fe (l), 

4-ij otherwise, 

and we can write the Type I and Type II error probabilities, given by Po(d k = 1) and Pi (dk = 0), 
respectively, as follows: 

F (d h = 1) = Po(L(X k ) > t fc (0))P (4-i = 0)+P (L(X k ) > t fc (l))P (4-i = 1) 
Pi (4 = 0) = Pi(L(X fc ) < i fc (0))Pi(4-i = 0) +Pi(L(X fc ) < t fc (l))P 1 (4_ 1 = 1). 
The total error probability at stage k is 

P^=vr Po(4 = l) + ^iPi(4 = 0) 

=ir (P (L(X k ) > t k (0)) +P (4(1) < L(X k ) < * fc (0))P (4-i = 1)) 
+ TrxOPifoCl) < L(X k ) < t fc (0))P x (4_i = 0) + Pi(L(X fc ) < * fc (l))). 

We prove the claim by contradiction. Suppose that there exists a strategy such that Pg — > as k — > oo. 
Then, we must have F (L(X k ) > t k (0)) -> and Pi(L(X fc ) < i fe (l)) -> 0. Recall that P and Pi are 
equivalent measures. Hence we have ¥i(L(X k ) > t k (0)) — > and ¥o(L(X k ) < t k (l)) — > 0. These imply 

2 Note that the system is symmetric with respect to = 1/2. For example, if the probability of flipping is 1, i.e., qk = 1, 
then the receiver can revert the received decision back since it knows the predecessor always 'lies.' 
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thatPj(t fc (l) < L(X k )<t k (0)) -> 1 fori = 0,1. But Pj(4-i) = %(l-P i (4-i)) + (l-%)^(4-i) = 
g& + (1 — 2(/fc)Pj(c4_i), which is bounded below by q k . Hence Pg is also bounded below away from 
in the asymptotic regime. This contradiction implies that P^ does not converge to 0. The proof for the 
general bounded memory case is similar and is given in Appendix A. 

■ 

B. Unbounded Memory 

In this section, we consider the case where can observe all its predecessors; i.e., m& = k — 1. We 
will show that using the myopic decision strategy, the error probability converges to in the presence 
of random flipping when the flipping probabilities are bounded away from 1/2. In the case where the 
flipping probability converges to 1/2, we derive a necessary condition on the convergence rate of the 
flipping probability such that the error probability converges to 0. Moreover, we precisely describe the 
relationship between the convergence rate of the flipping probability and the convergence rate of the error 
probability. 

If we state the conditions on the private signal distributions in a symmetric way, then it suffices to 
consider the case when the true hypothesis is Hq. In this case, our aim is to show that the Type I error 
probability converges to 0, i.e., Po(c4 = 1) — > 0, which is equivalent to saying that the public likelihood 
ratio Lk = Pi(-Dfc)/Po(-Djfc) of all the decisions converges to 0. We consider the myopic decision strategy; 
i.e., the decision made by the kth node is on the basis of the MAP test 

Pl(X fc ,g fc -i) gi7TQ 
P (*fc,Afc-l)Hofl 

where ttq and 7Ti are prior probabilities of the two hypotheses. Again, the corruption from <4 to d k 
is in the form of a binary symmetric channel with flipping probability denoted by q k . Without loss of 
generality, we assume that q k < 1/2 (because of symmetry). We will consider two cases: 

1) The flipping probabilities are bounded away from 1/2 for all k; i.e., there exists c > such that 
Qk < 1/2 — c for all k. This ensures that the corrupted decision still contains some useful information 
about the true hypothesis. We call this the case of uniformly informative nodes. 

2) The flipping probabilities qk converge to 1/2; i.e., q k — > 1/2 as k — > oo. This means that the 
broadcasted decisions become increasingly uninformative as we move towards the latter nodes. We 
call this the case of asymptotically uninformative nodes. 
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1) Uniformly informative nodes: We first show that the error probability converges to 0. Recall that 
b = P(Hi\X) denotes the private belief given by signal X. Let (Go,Gi) be the conditional distributions 
of the private belief b: 

Gj(s) =¥j(b < s). 

These distributions exhibit two important properties: 
a) Density proportionality: This property is easy to get from Bayes' rule: 

^ 



dG 1-6 

b) Dominance: Gi(s) < Go(s) for all s E (0, 1), and Gj(0) = and Gj(l) = 1 for j = 0, 1. 
We define an increasing sequence {Fk} of cr-algebras as follows: 

?k = &{Xi,X2, ... , Xk] di, d,2, ■ ■ ■ , dk)- 

Evidently dk is adapted to this sequence of cr-algebras. Moreover, given D^-i = {di, cfej ■ ■ ■ , 

and Xk, the decision dk is completely determined. Therefore, dk is also adapted to this sequence of 

cr-algebras. Note that the public likelihood ratio is 

L = PiCgfc) = b k 
h Fo(D k ) l-&fc" 

Again Lk is adapted to Fk- 

Lemma 1: Under hypothesis Hq, the public likelihood ratio sequence {Lk} is a martingale with respect 
to {Fk} and Lk converges to a finite limit almost surely. 

Proof: The expectation of conditioned on H and Fk is 

Eol-kfc+il-^fc] = 22 Po(rffc+i|^ r fc)-^fc+i 

4+1=0,1 



E n(dk+i\F k )L 



F^dk+ilFk) 



, n , Po(4+i|^) 

£fc E p o(4+i|^fc)-^7^ ± 4-rv = ife- 

j n1 Po(4+i^fc) 

d fc+1 =0,l 



Moreover, note that 



/ 



Li\ d¥ = l < co. 



Since Lk a non-negative martingale, by Doob's martingale convergence theorem E41 . it converges almost 
surely to a finite limit, assuming that Eo|Xfc] (which is independent of k) is finite. 
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Let Loo be the almost sure limit of L k conditioned on Hq, and note that Lqo < oo almost surely. This 
claim holds for both cases 1 and 2. By (0, we know that the public belief bk < 1 almost surely. The 
implication is that the public belief cannot go completely wrong. Moreover, for case 1 , we can show that 
the public likelihood ratio converges to 0. 

Lemma 2: Suppose that the flipping probabilities are bounded away from 1/2. Then under Hq, we 
have Lqo = almost surely. 

Proof: The proof is given in Appendix B. 

■ 

Theorem 5: Suppose that the flipping probabilities are bounded away from 1/2. Then, P^ — > as 

k — > oo. 

Proof: We know that the likelihood ratio test states that decides 1 if and only if > 1 — b k . 
The probability of deciding 1 given that Hq is true (Type I error) is given by 

P (4 = l) = Mh >l-h) 

= Eo(l-Go(l-6 fc ))- 

Since Loo = almost surely, we have b k — > almost surely. We have 

lim P (4 = 1) = lim E (l - Go(l - b k )). 

k—>oo k— >oo 

By the bounded convergence theorem, we have 

lim P (4 = 1) = 1 - Eo( lim G (l - b k )) 

k— >oo k— >oo 

= l-Go(l) =0. 

Similarly, we can prove that limfc^ oI I i(4 = 0) = (i.e., Type II error probability converges to 0). 
Therefore, the error probability converges to 0. 

■ 

Remark 1 (Additive Gaussian noise): Note that our convergence proof easily generalizes to the addi- 
tive Gaussian noise scenario: Suppose that after a k makes a decision d k 6 {0, 1}, it broadcasts a message 
dk = Fkdk+Nk to other nodes, where £ (0, 1) denotes a fading coefficient and A4 denotes zero-mean 
Gaussian noise with a finite variance a\. Then, we can show that the error probability converges to if 
Lfc are bounded away from and a k are bounded for all k. In other words, the signal-to noise ratios are 
bounded away from 0. 
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Now let us consider the convergence rate of the error probability. Without loss of generality, we assume 
that the prior probabilities are equal; i.e., 7ro = 7ri = 1/2. The following analysis easily generalizes to 
unequal prior probabilities. Recall that b k = F(Hi\D k ) denotes the public belief. It is easy to see that 
the error probability converges to if and only if b k — > almost surely given Hq is true and b k — > 1 
almost surely given H\ is true. Also by the property of the MAP test, a k+ \ decides Hi if and only if 
b~k+i > 1 — b k . Recall the density proportionality property: 

dG [ } ~ 1-b 

We further assume that under Hq and H\, the density of b exists. By the above property, we can write 
these densities as follows: 

where p(b) is a non-negative function. 

Without loss of generality, we assume that Hq is the true hypothesis. Moreover, we assume that 
p(l) > and p is continuous near b = 1. This characterizes the behavior of the tail densities. We will 
generalize our analysis to polynomial tail densities later, where p(b) — »■ as b — > 1. 

Let T~L denote the event that there exists a (random) ko such that the sequence of decisions d k = for 
all k > ko. Occurrence of this event signifies that after a finite number of decisions, the agents arrive 
at the true underlying state. Such an outcome also means that, eventually, each agent's private signal is 
overpowered by the past collective true verdict, so that a false decision is never again declared. In the 
literature on social learning, this phenomenon is called information cascade (e.g., [25]) or herding (e.g., 
|[T9l ). The occurrence of this event 7~L represents the best-case scenario in the sense that the convergence 
of the public belief to is the fastest among all possible sequences of dk- Therefore, the convergence 
rate analysis conditioned on T~L provides an upper bound for the convergence rate of the error probability. 

Conditioned on %, the Bayesian update of the public belief when dk+i = is given by: 

b k+1 =F(H 1 \i) k+1 ) 

= Pi(4+ilAQP(ffi|£fc) 

" Pi(4 +1 |Z? fc )P(fTi|I? fc ) + ¥ (d k+1 \D k )¥(H \D k ) 
^ (gfc + (1 ~ 2 gfc )Pi(4+i = 0\Dk))b k 

Ei=0,iGfc + (1 " 2q k )¥j{d k+1 = 0|£ fc ))P(#ilA0' D> 
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Notice that F(Hi\D k ) = b k and F(Ho\D k ) = 1 — b k . By Lemma 2, we have L k — )■ 0. This implies that 
6^ — )• 0. If 6fc is sufficiently small, then we have 

P 1 (4 +1 = 0|Z) fc ) = l- f 1 f 1 (x)dx^ l-p(l)( 6fc -M ) (6) 

and 

P (4+i = 0|D fc ) = 1 - / / (x)dx~l-/>(l)^. (7) 

Jl-b k 1 

We can also calculate the (conditional) Type I error probability: 



r 1 b 2 

P (4+i = l\D k ) = 1 - P (4+i = l\D k ) = / f°(x)dx ~ 



(8) 



Note that ([8]) characterizes the relationship between the decay rate of Type I error probability and the 
decay rate of b k . Next we derive the decay rate of b k . 

Substituting © and |7]) into ® and removing high order terms we obtain 



(1 - q k )h " (1 " 2q k )p(l)q 
k+i — 



(1 " Qk) 
This implies that 



b k+1 = b k ( i - L^k P (i)b k ) . (9) 



1 - Qk 

For any sequence that evolves according to ©, the following lemma characterizes the convergence rate 
of the sequence. 

Lemma 3: Suppose that a non-negative sequence c k satisfies c k+ \ = c k {l — ac%), where n > 2, c% < 1, 
and a > 0. Then, for sufficiently large k, there exists two constants C\ and C2 such that 

Ci , Co 



(aky/ n - Ck ~ (ak) 1 /™' 
This implies that c k — > as k — > 00 and c k = Q(k~ 1 ^ n ). 
Proof: The proof is given in Appendix C. 

■ 

Theorem 6: Suppose that the flipping probabilities are bounded away from 1/2 and p(l) is a non- 
negative constant. Then, the Type I error probability converges to as £l(k~ 2 ). 

Proof: Using (O and Lemma 3, we can get the convergence rate of the public belief conditioned on 
T-L, that is, b k = 0(A; _1 ). Recall that the occurrence of % represents the best-case scenario in the sense 
that the convergence of b k is the fastest among all possible outcomes. Therefore, we have b k = ^(k^ 1 ) 
almost surely. 
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Recall that dk = 1 if and only if bk > 1 — b&. Therefore, the Type I eiTor probability is given by 

P (4 = l) = nh >l-h) 

= Eo(l-Go(l-6*)). (10) 

Because p is continuous at 1, we have if x < 1 is sufficiently close to 1, i.e., 1 — x is positive and 
sufficiently small, then 

1 _ Go (a) = f (l _ x)p(x)dx > ^ / (1 - x)dx = gilKlzfl! , (ii) 

2 Jx 4 

From (fTOl) and (fTTT) and invoking Jensen's Inequality, we obtain 

Po(4 = 1) > ^jWl] 

> ^I(EqN) 2 . (12) 
Because 6^ = $7(A; _1 ) almost surely, we have Po(4 = 1) = Q(k~ 2 ). 

m 

Note that the convergence rate of error probability is exactly 0(k~ 2 ) conditioned on H, i.e., the upper 
bound for the convergence rate is achieved given %. Assume that p(0) > and p is continuous at 0. 
Then, we can use the same method to calculate the decay rate of the Type II error probability, which is 
the same as that of the Type I error probability. Note that the decay rate of the error probability depends 
linearly on (1 — 2qk)~ 2 . 

2) Asymptotically uninformative nodes: In this part, we consider the case where qu — > 1/2 as k — > 
oo, which means that the broadcasted decisions become asymptotically uninformative. Let = (1 — 
2qf c ) /(l — qk)- Note that q^ — > 1/2 implies that — > 0. This parameter measures how "informative" the 
corrupted decision is: For example, if qk = (where there is no flipping), then the decision is maximally 
informative in terms of updating the public belief. However if qk = 1/2, in which case Qk = 0, then the 
decision is completely uninformative in terms of updating the public belief. 

We will derive a necessary condition on the decay rate of Qk to for the public belief bk to converge 
to under Ho, which gives us a necessary condition on Qk for asymptotic learning. For any sequence 
that evolve according to (0, the following lemma characterizes necessary and sufficient conditions such 
that the sequence converges to 0. 

Lemma 4: Suppose that a non-negative sequence {ck} follows Ck+i = c&(l — a^c^), where n > 1, 
c\ > 0, and > 0. Then, Ck converges to if and only if there exists Uq such that J2T=k a k = °°- 
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Proof: We will use the following claim to prove the lemma: For a non-negative sequence satisfying 
c/c+i = Cfc(l — rfc), where ci > and r& G [0, 1), we have Ck — > if and only if there exists ko such 
that J2T=k r k = 00 • To show this claim, we have 

k 

Cfe+i = ci - r*). 
i=i 

Applying natural logarithm, we obtain 

k 

lnc fc+ i = lnci + 2jln(l - n). 

i=l 

From the above equation, we have Cfe — >• if and only if Ya^i m (l ~ r i) = —oo. In the case where there 
exists a subsequence of {r^} such that the subsequence is bounded away from 0, we have YliLi m (l ~~ 
Ti) = — oo. Therefore, Ck — > as — > oo. In the case where — >• 0, there exists k$ such that 
Ti < — ln(l — rj) < 2rj for all i > ko. Therefore, we have c& — > if and only if X]fc=fc r fc = °°- 

We now show the lemma. First we show that the condition is necessary. Suppose that c& — > 0. Then, 
we have X^fcLi a k c k = 00 • Since < 1, we have J2T=i a k = 00 • Second we show by contradiction 
that the condition is sufficient. Suppose that there exist ko such that X]fc=fc a k = oo and Ck does not 
converge to 0. Since is monotone decreasing, must converge to a nonzero limit c. Therefore, for 
all k, we have > c. Then, we have c^+i < Cfc(l — OfcC n ). We have ^fcLfc a fc cn = c?l SfcLfe a fc = °°- 
Therefore, we have Ck — > 0. 

■ 

Theorem 7: Suppose that there exists p > 1 such that 

Qk = ° [k(logk)p) ■ 
Then, the public belief converges to a nonzero limit almost surely. 

Proof: Suppose that there exists p > 1 such that Qk = O (l/(A;(log k) p )) . Then, YlT=i Qk < °°- 
Therefore by Lemma 4, bk in © does not converge to 0. Recall that (© represents the recursion of bk 
conditioned on 7-1, i.e., the node decisions are all 0. Therefore, the public belief is the smallest among 
all possible outcomes. Hence, the public belief converges to a nonzero limit almost surely. 

■ 

By (fT2l ). it is evident that if bk converges to a nonzero limit almost surely, then Po(dfc = 1) i s bounded 
away from and Po(^fc = 0) is bounded away from 1. Therefore, the system does not asymptotically 
learn the underlying truth. Hence Theorem 7 provides a necessary condition for asymptotically learning. 
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Theorem 7 also implies that for there to be a nonzero probability that the public belief converges to 
zero, we must have that there exists p < 1 such that Qk = 0(l/fc(log k) p )). If the public belief does not 
converge to zero, then it is impossible for there to be an eventual collective arrival at the true hypothesis. 
To explain this further, recall that % denotes the event that there exists a (random) ko such that dk = 
for all k > ko. We use C to denote the event {bk — > 0}. Notice that H occurs only if C occurs. Hence, 
T-L is a subset of the event that bk — > 0, i.e., % C C. These leads to the following corollary of Theorem 7. 

Corollary 4: If Q k = 0(l/k(logk) p )) for some p > 1, then F(H) = 0. 

So, by the corollary above, only if Qk = $7(l//c(log k) p )) for some p < 1 can we hope for there to be 
a nonzero probability that bk — > and thus of information cascade to the truth. Even under the situation 
that bk — > 0, i.e., conditioned on C, we expect that the rate at which bk — > depends on the scaling law 
of Qk- The following theorem relates the scaling laws of {Qk} with those of {bk} and the Type I error 
probability sequence {Po((4 = 1)}. 

Theorem 8: Conditioned on C, we have the following: 

(i) Suppose that Qk = Q(l/k 1 ~ p ) where p G (0, 1). Then, bk = Q(k~ p ) almost surely and Po(c4 = 

i) = n{k- 2p ). 

(ii) Suppose that Q k = Q(l/k). Then, b k = fi(l/logfc) almost surely and P (d fc = 1) = 0(l/(log k) 2 ). 

(iii) Suppose that Q k = 6 (l/(fc(log k) p )) where p G (0,1). Then, b k = fi(l/(log k) q ) almost surely, 
where 1/q + 1/p = 1, and P (4 = 1) = n(l/(Iog k) 2c >). 

(iv) Suppose that Qk = 6 (l/(& log k)). Then, bk = 0(1/ log log k) almost surely and Po(c4 = 1) = 
0(l/(loglogfc) 2 ). 

Proof: The proof is given in Appendix D. 

■ 

Note that Theorem 8 provides upper bounds for the convergence rates of the public belief and error 
probability. It is easy to show that conditioned on H, these upper bounds for the convergence rates are 
achieved. However, recall that % is a subset of the event that bk — > 0. Therefore, even if bk — > with 
certain probability, the probability of % is not guaranteed to be nonzero. Next we provide a necessary 
condition such that the probability of % is nonzero. 

Theorem 9: Suppose that there exists p > 1 such that 

/ (p + logfeXlogAQP- 1 ^ 
Vfe V (Mlog*O p ) 1/2 /' 

Then, we have F(H) = 0. 
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Proof: We first state a key lemma which is a corollary of the Borel-Cantelli lemma II241 . Consider 
a probability space (S,S,V) and a sequence of events in S. We define the limit superior of 
as follows: 

oo 

limsupf*. = P| ( [J £ n )- 



k— >oo . 

k=l n=k 



Note that this is the event that infinitely many of the occur. We use to denote the complement of 

£k- 



Lemma 5: Suppose that 



fc=i 



Then, 



k—>oo 



The proof of this lemma is omitted. Now we prove the theorem. Let £}. be the event that dk = 1, i.e., 

C 
k 



dk makes the wrong decision given Hq. Notice that £^ is the event that dk = 0. If 



(p + log fc) (log k)P 1 

Qk = V 



(Jfe(log fc)P)V2 

then using the similar analysis as those in Theorem 8, we have 

iPo(£fc|£jfc_i,£fc_2j • • • i£\) = ^ 



fc(log/c)P / 

This implies that these terms are not summable, i.e., ^ 

S^iPo(£fc|£fcLi,£fcL 2 » • • • = oo- Therefore 
we have Po (lim sup^^ £^) = 1, which means that with probability 1,^ = 1 occurs for infinitely many 
fc. Consequentially, we have Po(%) = 0. By symmetry, Pi("H) = 0. This concludes the proof. 

■ 

Suppose that the flipping probability converges to 1/2 sufficiently fast. Then, even if the public belief 
converges to 0, its convergence rate is very small because the broadcasted decisions become uninformative 
in a fast rate. In this case, the private signals are capable to overcome the public belief infinitely often 
because of the slow convergence rate of the public belief. 

3) Polynomial tail density: We now consider the case where the private belief has polynomial tail 
densities, that is, p(b) — > as b — > 1 and there exist constants a, c > such that 

lim = c. (13) 

6->i (1 - b) a 

Note that a denotes the leading exponent of the Taylor expansion of the density at 1. The larger the 
value of a, the thiner the tail density. Note that Theorem 7 (necessary condition for P(£) > 0) which 
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was stated under the constant density assumption is also valid in the polynomial tail density case. We 
can use the similar analysis as before to derive the explicit relationship between the convergence rate of 
Qk and the convergence rate of the public belief conditioned on C. The following theorem establishes 
the scaling laws of the public belief and Type I error probability for both uniformly informative and 
asymptotic uninformative cases. 

Theorem 10: Consider the polynomial tail density defined in (TT3T ). 

1) Uniformly informative case: Suppose that the flipping probabilities are bounded away from 1/2. 
Then, we have b k = J^fc-VCa+i)) almost surely and P (4 = 1) = n(jH a+2 V(° +1 )). 

2) Asymptotically uninformative case: Suppose that the flipping probabilities converge to 1/2, i.e., 
Qk — > 0. Conditioned on C, we have 

(i) if Q k = ®(\/k l -P) where p G (0, 1), then b k = 0(A;-P/( a+1 )) almost surely and P (4 = 1) = 
fi(jfe-(«+ 2 )p/(«+i)), 

(ii) if Q k = &(l/k), then& fc = 0((log jfe)" 1 /^!)) almost surely and P (4 = 1) = fi((log fc)-( a+2 )/( a+1 )), 

(iii) if Q k = 9 (l/(Jfe(log k)P)) where p G (0, 1), then b k = fi((log fc)-«/( a+1 )) almost surely, where 
1/q+l/p = I, and P (4 = 1) = 0((log £;)-(«+%/ (<*+!)), 

(iv) if Q fc = 9(l/(fclogfc)), then 6 fc = n((Iog log ky l ^ a+1 ^) almost surely and 
P (4 = 1) = ^((loglogfc)-( a+2 )/( a+1 )). 

Proof: The proof is given in Appendix E. 

■ 

Note that these upper bounds for convergence rates of bk and Po(4 = 1) are achieved conditioned on 
Tl. Next we provide a necessary condition such that % has nonzero probability. 
Theorem 11: Suppose that there exists p > 1 such that 



Then, we have F(H) = 0. 

Proof: The proof is similar with that of Theorem 9 and is omitted. 

■ 

Note that as a gets larger, this necessary condition states that Qk has to decay very slowly in order 
that it is possible for % to occur. 

Similarly we can calculate the decay rate for the Type II error probability Pi (4 = 0). Assume that 
the tail density is given by lim^o p(b) jb a = c where a, c > 0. Then, we can show that if the flipping 
probabilities are bounded away from 1/2, then Pi (4 = 0) = J7(£; _ ( a+2 )/( a+1 )). The decay rate of the 



Q k = 



( 



(jp + log k) (log k) p 1 

{k(\og jfc)p)l/(o+2) 



) 
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error probability is given by 

P fe = $7 ^-( 1 + 1 /( max ( a '«)+ 1 ))^ 
where the upper bound on rate is achieved conditioned on %. 

V. Concluding Remarks 

We have studied the sequential hypothesis testing problem in two types of broadcast failures: erasure 
and flipping. In both cases, if the memory sizes are bounded, then there does not exist a decision strategy 
such that the error probability converges to 0. In the case of random erasure, if the memory size goes 
to infinity, then there exists a decision strategy such that the error probability converges to 0, even 
if the erasure probability converges to 1. We also characterize explicitly the relationship between the 
convergence rate of the error probability and the convergence rate of the memory. In the case of random 
flipping, if each node observes all the previous decisions, then with the myopic decision strategy, the 
error probability converges to 0, when the flipping probabilities are bounded away from 1/2. In the case 
where the flipping probability converges to 1/2, we derive a necessary condition on the convergence rate 
of the flipping probability such that the error probability converges to 0. We also characterize explicitly 
the relationship between the convergence rate of the flipping probability and the convergence rate of the 
error probability. Finally, we have derived a necessary condition such that the event herding has nonzero 
probability. 

Our analysis leads to several open questions. In the case of random flipping, we have not studied the 
case where the memory size goes to infinity but each node cannot observe all the previous decisions. We 
also want to generalize the techniques used in this paper to more general network topologies. Moreover, 
besides erasure and flipping failures, we expect that our techniques can be used in the additive Gaussian 
noise scenario. With finite signal-to-noise ratios (SNR), the martingale convergence proof in Lemma 2 
easily generalizes to this scenario. However, if SNR goes to infinity (e.g., the fading coefficient goes to 
0, the noise variance goes to infinity, or the broadcasting signal power goes to 0), it is obvious that the 
convergence of error probability is not always true. We want to derive necessary and sufficient conditions 
on the convergence rate of SNR such that the error probability still converges to 0. 
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Appendix A 
Proof of Theorem 3 

W extend the proof to the case where each node observes m k > 1 previous decisions. The likelihood 
ratio test in this case is given by 



4 



1 if L{X k ) > i(4-i, 4-2, ■ ■ ■ , 4-mJ, 

if L(X k ) < t(4-i, 4-2, • • • , 4-mJ, 

where t(4-i, 4-2, ■ • • , 4-mJ = t k /L(d k _ 1 , 4-2, • • • , 4-mJ denotes the testing threshold. Among 
all possible combinations of {4-1,4-2, • • • , 4-m fc }, it suffices to assume that the likelihood ratio in 
the case where each decision equals (denoted by mfc ) is the smallest and that in the case where each 
decision equals 1 (denoted by l mfc ) is the largest. Otherwise, we can always find the smallest and largest 
likelihood ratio. The case where the likelihood ratios for all possible combinations are equal can be 
excluded because it means the decisions observed have no useful information for hypothesis testing; and 
the node has to make a decision based on its own measurement, in which case the error probability does 
not converge to 0. 

From these, we can define the Type I and II error probabilities as follows: 

p (4 = i) = n(L(x k ) > t fc (o^))P (4-i = o, 4-2 = o, . . . , 4-m* = o) 

+ n(L(x k ) > t k (i, o, o, ... , o))P (4-i = i, 4-2 = o, . . . , 4-m* = o) + . . . 

+ F (L(X k ) > t fc (l mfc ))P (4-l = 1, 4-2 = 1, • • • , 4-m fe = 1) 

= F (L(X k ) > t k (0 m «)) +P (tfc(l,0,0,. . . ,0) < L(X k ) < t k (0 m «)) 

Po(4-l = 1, 4-2 = 0, . . . , d k - mk = 0) + . . . 

+ P (t fc (l mfc ) < L(X k ) < t fc (O mfc ))P (4-l = 1,4-2 = 1, • • • Jk-m k = 1) 
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and 

Pi (4 = o) = Pi(L(x fc ) < t fc (o m *))p 1 (4-i = o, 4-2 = o, . . . , = o) 

+ Pi(L(x fc ) <t fc (i,o,o,...,o))Pi(4-i = 1,4-2 = o,...,4-m fc =o) + ... 

+ Pi(L(X fe ) < i fc (l ro *))Pi(d fc _i = 1, 4-2 = 1, • • • , 4-m* = 1) 

= Pl(^(l mt ) < L(X fc ) < i fe (0" lfc ))P 1 (4-l = 0,4-2 = 0, . . . ,4-m fc = o) 

+ Pi(t fc (l mfc ) < L(X fc ) < i fe (l,0,0, . . . ,0))P (4-i = 1,4-2 = o, . . .,4-m = 0) + . . . 
+ Pi(L(X fc )<t fc (l"»)). 
With the similar argument as that in the tandem network case, we have 

P* = 7T Po(dfc = 1) + 7TlPl(4 = 0) 

Suppose that IP* -> as k -> oo. Then, we must have F {L(X k ) > t k (0 m »)) -> and Pi(L(X fc ) < 
t fc (l mfc )) ->• 0. Recall that P and Pi are equivalent measures. Hence we have Pi(L(X fc ) > t k (0 mk )) ->• 
and P (L(X fc ) < t fc (l mfc )) -»• 0. These imply that ¥ j (t k {0 m ") < L(X k ) < t k (l mk )) -»• 1 for j = 0, 1. 
We also have 

Pj(4-1, 4-2, • • • , 4-m) = 

Pj(4-l|4-2, • • • , 4-m)Pj(4-2|4-3, • • • , 4-m) • • • Pj (4-m+l |4-m)Pj (4-m)- 

We already know that Pj(4-m) is bounded away from by q k . Similarly, we can show 

Pj(4-j|4-i-l, • • • , d k - mk ) 

= q k (l - Pj(4_j|4-j-l, • • ■ ,4-m J) + (1 - ?fe)Pj(4-i|4-i-l, • • . ,4-m J 
= % + (1 - 2g fe )Pj(4_j|4-i-l, • • • , 4-mJ- 

Hence Pg is also bounded below by q™ k > q£ . This contradiction implies that Pg does not converge to 
with any decision strategy. 

Appendix B 
Proof of Lemma 2 

Assume that A = {(X n = x n , d n = j n ) : > 0} has positive probability, where j n G {0, 1}. Here 
(x n ,jn) represents an element of the sample space of measurements and "corrupted" decision, which 
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define the cr-algebras T n . For the public likelihood ratio, we have the following recursion: 

P 1 (D n+1 ) PiCdn+ilAO T .... 

i'n+l = ' = * ' L-'n- \X^> 

P (A,+l) P (d n+1 \D n ) 

Using the stationary property (see fl9l ) of this martingale process, that is, any £ in the support of is 
a fixed point of the process described in (fT4l) . Therefore, we have 

Pi(4+ilA.) ?1 (15) 

P (4+l|Ai) 

Now 

Pl(4+l|Ai) _ Ed„ +1 IP> l(rfn+l|^n)P(4+l|d n+ l) 



P(d n+1 |/J n ) Ed n+1 P (dn+l|^n)P(rfn+l|d n+ l) 

= Pi(c? n+ i|D w )(l-2g fc ) + g fc 
P (d n+ i|D„)(l-2g fc ) + g fe ' 

Equation (fT6l ) together with ( fT5T ) implies 

Pl(d n+ l|7\) 



(16) 



P (d n+ i| J D n ) 
Now the statement d n+ \ = is equivalent to 



1. (17) 



dPipf w+1 ) vro 

dP (X n+1 ) n 7Ti" 

Since the private signals are identically distributed, the likelihood ratio of the private signals is independent 
of n under each of Hq and Hi. Hence we can rewrite the test as 

dPi(X) .^0 

Thus (fTTl ) is equivalent to 

F {L(X) < *° ) = P!(L(X) < ^° ) (18) 
for all possible choices of infinite sequences to £ A. By (HJ, which states that 

Thus (fT8l) is also equivalent to 

G ((l + Ax^))" 1 ) = Gi((l + LooM)- 1 ). 

However by assumption, this is true if and only if L^u;) = or oo. Also by the fact that a non-negative 
martingale converges to a finite limit P-almost surely, we have L M = almost surely. 
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Appendix C 
Proof of Lemma 3 

First it is easy to see that — > because it is the only fixed point of the recursion. To show the 
convergence rate, we treat the recursion (|9]) as an ordinary difference equation (ODE). Therefore, we 
have 

dc k _ _ nr n+l 
dk ~ Ck ■ 

The solution to this ODE is for some C > 

C 



{ak) 1 / n ' 

Therefore, for sufficiently large k, there exists two constants C\ and C2 such that 



which implies that 



(aky/ n - Ck ~ (afe) 1 /"' 
c k = Qik- 1 ^). 



Appendix D 
Proof of Theorem 8 

(i) . Suppose that Qk = 0(1/ k 1 ^ 6 ) where e G (0, 1). Conditioned on 7i, we have recursion © for the 
public belief bk- Using this recursion, we can get similar results as those in Lemma 3, that is, there exists 
C\ > and C 2 > such that 

kQk kQk 
Plugging in the convergence rate of Q k in ( fT9l ) establishes the claim. 

(ii) -(iv). Suppose that Q k = ®{l/k{\og k) p ), where p G [0, 1]. Then, by ©, we have 

h+1 ~ hk - k(\^kr 

for some constant C > 0. This is an ordinary difference equation (ODE). For p = 0, the solution to 
this ODE satisfies b k = ©(1/logA;), which proves (ii). When p G (0,1), the solution satisfies = 
0(l/(log k) q ), where 1/q + 1/p = 1. This establishes (iii). Finally, when p = 1, the solution satisfies 
bk = log log k). Note that all these rates are derived conditioned on T~L. By the fact that conditioned 
on 7i, the decay rate is the fastest among all outcomes, we obtain the desired results. 

Having established the convergence rate of b^, the convergence rate for the error probability in each 
claim follows from (fT2l). 
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Appendix E 
Proof of Theorem 10 

Proof of claim 1: If the flipping probabilities are bounded away from 1/2, then the public belief b k 
converges to and conditioned on % we have 



\{d k+x = Q\b k ) = l- [ f\x)dx~l 

Jl-b h 



-b a k +1 (20) 

a K 



and 

• i 



P (4+i = 0\D k ) = 1-1 f°(x)dx ~ 1 - -^—b a + 2 . (21) 

Jl-b„ a + i 

We can also calculate the (conditional) Type I error probability in this case: 

Po(d*+i = MD k ) = 1 - Pq(4+i = l\D k ) = / f(x)dx ~ ~^rb a k +2 . (22) 

Ji-b k a + L 

Note that (1221 describes the relationship between the decay rate of Type I error probability and the decay 
rate of b k . Next we derive the decay rate of b k . 

By (1201) and (1211) . we can derive the recursion for the public belief as follows: 



c 



*+i =h- -Q k b a k +2 . (23) 



a 



By Lemma 3, we know that b k — > and the decay rate is b k = Q(k 1 /( a + 1 )). Recall that conditioned 
on TL, the convergence of b k is the fastest. Therefore, we have 

b k = JXAT^+i)) 



almost surely. From (1221 and invoking Jensen's Inequality, we obtain 

p (4 = i) > ^ K oK +2 } 

> -^—(E [b k }r +2 . (24) 
a + 1 

Because b k = U(k^ 1 ^ a+1 ' > ) almost surely, we have 

P (4 = Hi) = Q(A;-( a+2 )/( a+1 )). 

Note that P (4 = Hi) = G(/H a+2 )/( a+1 )) conditioned on U. 

Proof of claim 2: Using Lemma 3, we can show that there exist two positive constants C\ and C2 
such that 

(a+1) S °fc ^ /, n , u / (a+ i) • ^ 



(fcQ fc ) 1/(a+1) - - (fcg fc )V(a+i) • 
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Therefore, if Qk = l/k 1 ^ p , then using (l25l) and the fact that bk given % is the smallest among all 
possible outcomes, we have bk = Q(k~ p ^ a+1 '). This establishes (i). For (ii)-(iv), we can solve the ODEs 
given by d23l ) and the solutions give rise to the convergence rates for b^, which in turn characterize the 
convergence rates of the error probabilities. 
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